Comparison Metrics for Large Scale Political Event Data Sets

نویسنده

  • Philip A. Schrodt
چکیده

This paper addresses three general issues surrounding the use of political event data generated by fully automated methods in forecasting political conflict. I first look at the differences between the data generation process for machine and human coded data, where I believe the major difference in contemporary efforts is found not in the precision of the coding, but rather the effects of using multiple sources. While the use of multiple sources has virtually no downside in human coding, it has great potential to introduce noise in automated coding. I then propose a metric for comparing event data sources based on the correlations between weekly event counts in the CAMEO “pentaclasses” weighted by the frequency of dyadic events, and illustrate this with two examples: • A comparison of the new ICEWS public data set with an unpublished data set based only on the BBC Summary of World Broadcasts. • A comparison of the TABARI shallow parser and PETRARCH full parser for the 35-year KEDS Reuters and Agence France Presse Levant series. In the case of the ICEWS/BBC comparison, the metric appears useful not only in showing the overall convergence—typical weighted correlations are in the range of 0.45, surprisingly high given the differences between the two data sets—and showing variations across time and regions. In the case of TABARI/KEDS, the metric shows high convergence for the series with a large number of reports, and also shows that the PETRARCH coding reduces the number of material conflict events—presumably mostly by eliminating false positives—by around a factor of 2 in most dyads. In both tests, the metric is good at identifying anomalous dyads, Asia in the case of ICEWS and Palestine in the case of the TABARI-coded Levant series. The paper concludes with a prioritized list of issues where further research and development is likely to prove productive.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Framework for Optimal Attribute Evaluation and Selection in Hesitant Fuzzy Environment Based on Enhanced Ordered Weighted Entropy Approach for Medical Dataset

Background: In this paper, a generic hesitant fuzzy set (HFS) model for clustering various ECG beats according to weights of attributes is proposed. A comprehensive review of the electrocardiogram signal classification and segmentation methodologies indicates that algorithms which are able to effectively handle the nonstationary and uncertainty of the signals should be used for ECG analysis. Ex...

متن کامل

Constructive Dynamisms of Large-Scale Urban Projects by the Space Political Economy Approach; a Case Study of Mashhad Metropolis

Aims: The development of large-scale construction projects has transformed the shape of cities towards specific objectives and based on economic and political perspectives that dominate policy-making and planning in cities. The purpose of the research was to study and analyze the spatiality of Mashhad construction mega-projects and to explain the constructive forces and dynamisms of these proje...

متن کامل

Sidelines: An Algorithm for Increasing

Aggregators rely on votes, and links to select and present subsets of the large quantity of news and opinion items generated each day. Opinion and topic diversity in the output sets can provide individual and societal benefits, but simply selecting the most popular items may not yield as much diversity as is present in the overall pool of votes and links. In this paper, we define three diversit...

متن کامل

Sidelines: An Algorithm for Increasing Diversity in News and Opinion Aggregators

Aggregators rely on votes, and links to select and present subsets of the large quantity of news and opinion items generated each day. Opinion and topic diversity in the output sets can provide individual and societal benefits, but simply selecting the most popular items may not yield as much diversity as is present in the overall pool of votes and links. In this paper, we define three diversit...

متن کامل

Taxonomy of Global Air Transport

Data from the United Nations and the International Civil Aviation Organization Information Systems were used as a base for characterizing, classifying and comparing air transport demand and supply features of 156 countries. Relevant data from 1980 were chosen to reflect five sets of characteristics namely, air transport, 50cm-economic status, population demography, geographical and environmenta...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015